Session 4

R Introduction Workshop

August 01, 2017

Visualization with R

There are 3 main families of visualization functions:

Base R visualization

Basic plot syntax:
plot(x , y) x: vector for x axis, y: vector for y axis

See ?plot

x <- 1:10 
y <- 1:10
plot(x, y)

Base R visualization: Scatterplot with iris

plot(iris$Sepal.Width, iris$Sepal.Length)

Base R visualization: Histogram with iris

hist(iris$Sepal.Width)

Base R visualization: Using par() to plot multiple plots

par(mfrow=c(1,2))
plot(iris$Sepal.Width, iris$Sepal.Length)
hist(iris$Sepal.Width)

plot() vs ggplot()

A picture is worth a thousand words – when the picture is good

Add layers to ggplot()

And make it interactive with ggplotly()

ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics

Installation

# The easiest way to get ggplot2 is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just ggplot2:
install.packages("ggplot2")

# Don't forget to load tidyverse to your environment
library(tidyverse)

# Or just ggplot2
library(ggplot2)

Usage

  1. Start with ggplot(),
    • supply a dataset
    • and aesthetic mapping using aes().
  2. You can then add on layers such as:
    • Geom (geometric object) with various geom_ functions.
    • Scales with various scale_ or labs() and lims() functions.
    • Faceting specifications with facet_ functions
    • Coordinate systems with coord_ functions

Building a ggplot from scratch with iris

Step 0: Let’s remember the iris data

head(iris, 3)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Step 1. Define data and aesthetics with aes()

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p

Step 2. Define plot type with geom_

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point()

Step 3. Assign more aesthetics

Step 3.1 Add color

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species))

Step 3.2 Add color + size

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length))

Step 3.3 Add color + size + alpha (transparency)

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width))

Step 3.4 Add color + size + alpha + shape

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species))

Step 4. Customize legend

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species)) +
    guides( color=guide_legend(ncol = 3, byrow = TRUE), 
            size=guide_legend(ncol = 3, byrow = TRUE), 
            alpha=guide_legend(ncol = 3, byrow = TRUE))

Step 5: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width)) + 
    geom_smooth()

What will this give me?

Step 5: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width)) + 
    geom_smooth()

Ooops! What happened??

Step 5.1: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) + 
    geom_smooth()

Why did this work now?
Can you see the difference?

Step 5.2: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) + geom_smooth(aes(color=Species))

What about this? What’s happening here?

Step 6: Facetting

Step 6.0: Create a toy dataset

Let’s generate a hypothetical iris with some added ecosystem type and precipitation data.

ecosys <- sample(c("Forest", "Riparian", "Urban"), size = 150, replace = T)
precp <- sample(c("Heavy", "Mild"), size = 150, replace = T)

iris2 <- cbind(iris, Ecosystem=ecosys, Precipitation=precp)
head(iris2)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Ecosystem
## 1          5.1         3.5          1.4         0.2  setosa    Forest
## 2          4.9         3.0          1.4         0.2  setosa  Riparian
## 3          4.7         3.2          1.3         0.2  setosa     Urban
## 4          4.6         3.1          1.5         0.2  setosa     Urban
## 5          5.0         3.6          1.4         0.2  setosa     Urban
## 6          5.4         3.9          1.7         0.4  setosa    Forest
##   Precipitation
## 1         Heavy
## 2          Mild
## 3         Heavy
## 4         Heavy
## 5          Mild
## 6         Heavy

Step 6: Facetting

Step 6.1: Facet iris2

Now, I would like to see how my previous graph changes for the different types of ecosystem and precipitation.

This was the graph :
- I am not using geom_smooth for now because I do not have enough data points for model prediction.
- Also I will remove the alpha aesthetic to make it easier for us to see.

p2 <- ggplot(data=iris2, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p2 <- p2 + geom_point(aes(size=Petal.Length)) # + geom_smooth() 
p2

Now I added facets!

p2 + facet_grid(Ecosystem ~ Precipitation) 

I can customize the facets very easily!

p2 + facet_grid( . ~ Precipitation) 

p2 + facet_grid(Ecosystem ~ .) 

p2 + facet_grid(Precipitation ~ .) 

You get the idea here right?

Step 6.2: Facet wages

You can use facet_wrap if you want to facet by just 1 variable but you want to organize them nicely.

First, let’s read the wages data in R.

wages <- read.csv("./wages.csv", 
                   header = T, 
                   stringsAsFactors = T) 
head(wages, 3)
##       earn height    sex  race ed age
## 1 79571.30  73.89   male white 16  49
## 2 96396.99  66.23 female white 16  62
## 3 48710.67  63.77 female white 16  33

Let’s create age categories with cut() function.

wages <- wages %>% mutate(age_cat = cut(age, breaks = seq(20, 100, by=20)) )
head(wages, 4)
##       earn height    sex  race ed age  age_cat
## 1 79571.30  73.89   male white 16  49  (40,60]
## 2 96396.99  66.23 female white 16  62  (60,80]
## 3 48710.67  63.77 female white 16  33  (20,40]
## 4 80478.10  63.22 female other 16  95 (80,100]

Let’s plot it

pw <- ggplot(wages, aes(x=height, y=earn)) +
      geom_point(aes(size=ed), alpha=0.5)
pw

pw + facet_wrap(~age_cat)

Or you can specify the rows and columns for the faceting

pw + facet_wrap(~age_cat, ncol=5)

Your turn (5 mins)

Plot the wages.csv data like the following